-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/reliable int mapping #30
Fix/reliable int mapping #30
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple comments. I really should have put test_hierarchy
in test/common
-- one for later!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
Context
We used to hash data rows, and assign to clusters of data rows an identity equal to the hash of all their children. We're now moving to a more compact representation, where data rows are given an integer key, and we want a unique way of mapping new clusters to new integer keys. This is made more tricky by the fact that we're parallelising the construction of hierarchical clusters.
Changes proposed in this pull request
Guidance to review
The fact that keys are always negative means that it's possible to build a hierarchy where keys are themselves parts of keyed sets, and it's easy to distinguish integers mapped to raw data points (which will be non-negative), to integers that are keys to sets (which will be negative). The salt allows to work with a parallel execution model, where each worker maintains their separate key space, as long as each worker operates on disjoint subsets of positive integers. The salt and a key are combined via the Cantor pairing function.
Checklist: